Search CORE

UCL Discovery

Digital.CSIC

Coding potential of the products of alternative splicing in human

Author: A Mortazavi
Anna Tramontano
B Boeckmann
Domenico Raimondo
E Melamud
ET Wang
EV Kriventseva
EW Deutsch
F Beaussart
F Birzele
Fabrizio Ferrè
Guido Leoni
HM Berman
J Stetefeld
J Söding
L Cavallo
Loredana Le Pera
M Floris
M Sultan
ML Tress
ML Tress
N Eswar
N Pattabiraman
NR Voss
P Blakeley
P Mallick
PJ Gardina
Q Pan
Robert D. Finn
S Tanner
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Analysis of the human genome has revealed that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized genes. A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein. Results: In this study we analyze alternative splicing isoforms of human gene products that are unambiguously identified by mass spectrometry and compare their properties with those of isoforms of the same genes for which no peptide was found in publicly available mass spectrometry datasets. We analyze them in detail for the presence of uninterrupted functional domains, active sites as well as the plausibility of their predicted structure. We report how well each of these strategies and their combination can correctly identify translated isoforms and derive a lower limit for their specificity, that is, their ability to correctly identify non-translated products. Conclusions: The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains. Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity

Springer - Publisher Connector

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio della ricerca- Università di Roma La Sapienza

Alternative splicing and the evolution of phenotypic novelty

Author: Altschmied J
Araxi O. Urrutia
Jaime M. Tovar-Corona
Lu Chen
Lynch M
Stephen J. Bush
Tress ML
Ward AJ
Zhang C
Publication venue: 'The Royal Society'
Publication date: 01/01/2016
Field of study

Alternative splicing, a mechanism of post-transcriptional RNA processing whereby a single gene can encode multiple distinct transcripts, has been proposed to underlie morphological innovations in multicellular organisms. Genes with developmental functions are enriched for alternative splicing events, suggestive of a contribution of alternative splicing to developmental programmes. The role of alternative splicing as a source of transcript diversification has previously been compared to that of gene duplication, with the relationship between the two extensively explored. Alternative splicing is reduced following gene duplication with the retention of duplicate copies higher for genes which were alternatively spliced prior to duplication. Furthermore, and unlike the case for overall gene number, the proportion of alternatively spliced genes has also increased in line with the evolutionary diversification of cell types, suggesting alternative splicing may contribute to the complexity of developmental programmes. Together these observations suggest a prominent role for alternative splicing as a source of functional innovation. However, it is unknown whether the proliferation of alternative splicing events indeed reflects a functional expansion of the transcriptome or instead results from weaker selection acting on larger species, which tend to have a higher number of cell types and lower population sizes.This article is part of the themed issue 'Evo-devo in the genomics era, and the origins of morphological diversity'

Edinburgh Research Explorer

Oxford University Research Archive

High-Throughput Proteomics Detection of Novel Splice Isoforms in Human Platelets

Author: A Keller
AJ Atkinson
Andreas de Stefani
B Bodenmiller
B Modrek
BJ Blencowe
BP Lewis
C Sugnet
Cathal Seoighe
D Brett
DB Constam
E Durr
E Kim
ET Wang
F Clark
F Mo
GS Wang
H Liu
H Schwertz
H Yu
J Marioni
J Mestres
James P. McRedmond
JK Eng
JP McRedmond
Karen A. Power
M Matarin
M Sultan
MC Ozelo
ML Tress
ML Tress
MM Denis
O Holtkotter
P Censarek
P Harrison
P Mallick
P Rustin
Peadar Ó Gaora
PJ Kersey
R Sorek
S Draghici
S Mathivanan
S Stamm
S Tanner
SA Newland
SA Santoro
TA Thanaraj
TJP Hubbard
William M. Gallagher
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Alternative splicing (AS) is an intrinsic regulatory mechanism of all metazoans. Recent findings suggest that 100% of multiexonic human genes give rise to splice isoforms. AS can be specific to tissue type, environment or developmentally regulated. Splice variants have also been implicated in various diseases including cancer. Detection of these variants will enhance our understanding of the complexity of the human genome and provide disease-specific and prognostic biomarkers. We adopted a proteomics approach to identify exon skip events - the most common form of AS. We constructed a database harboring the peptide sequences derived from all hypothetical exon skip junctions in the human genome. Searching tandem mass spectrometry (MS/MS) data against the database allows the detection of exon skip events, directly at the protein level. Here we describe the application of this approach to human platelets, including the mRNA-based verification of novel splice isoforms of ITGA2, NPEPPS and FH. This methodology is applicable to all new or existing MS/MS datasets

CiteSeerX

Central Archive at the University of Reading

The fitness cost of mis-splicing is the main determinant of alternative splicing patterns

Author: A Reyes
AK Ramani
Alexandra Popa
Anamaria Necsulea
Baptiste Saudemont
BJ Blencowe
BR Graveley
C Trapnell
Corinne Blugeon
CR Edwards
E Dubois
E Kim
E Melamud
Eric Meyer
F Abascal
FM Hamid
G Drechsel
I Ezkurdia
J Beisson
J Merkin
J Weischenfeldt
J-M Aury
JE Smith
JJ-L Wong
JJL Wong
JK Pickrell
JM Mudge
Joanna L. Parmley
JZ Ni
L Duret
Laurent Duret
LF Lareau
LF Lareau
M Bulmer
M Graille
M Irimia
M Kalyna
M Wang
ML Tress
ML Tress
MW-L Popp
N Stepankiw
NJ McGlincy
NL Barbosa-Morais
O Garnier
O Jaillon
O Kelemen
PL Boutz
RGH Lindeboom
The 1000 Genomes Project Consortium
TW Nilsen
U Braunschweig
Vincent Rocher
W Sung
W Sung
Y Ge
Y Marquez
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background Most eukaryotic genes are subject to alternative splicing (AS), which may contribute to the production of protein variants or to the regulation of gene expression via nonsense-mediated messenger RNA (mRNA) decay (NMD). However, a fraction of splice variants might correspond to spurious transcripts and the question of the relative proportion of splicing errors to functional splice variants remains highly debated. Results We propose a test to quantify the fraction of AS events corresponding to errors. This test is based on the fact that the fitness cost of splicing errors increases with the number of introns in a gene and with expression level. We analyzed the transcriptome of the intron-rich eukaryote Paramecium tetraurelia. We show that in both normal and in NMD-deficient cells, AS rates strongly decrease with increasing expression level and with increasing number of introns. This relationship is observed for AS events that are detectable by NMD as well as for those that are not, which invalidates the hypothesis of a link with the regulation of gene expression. Our results show that in genes with a median expression level, 92–98% of observed splice variants correspond to errors. We observed the same patterns in human transcriptomes and we further show that AS rates correlate with the fitness cost of splicing errors. Conclusions These observations indicate that genes under weaker selective pressure accumulate more maladaptive substitutions and are more prone to splicing errors. Thus, to a large extent, patterns of gene expression variants simply reflect the balance between selection, mutation, and drift

ZENODO

INRIA a CCSD electronic archive server

HAL-Inserm

HAL Descartes

Assessment of orthologous splicing isoforms in human and mouse orthologous genes

Author: A Mortazavi
A Riva
AV Alekseyenko
BB Wang
BB Wang
C Trapnell
Carmela Gissi
David S Horner
DB Malko
E Kim
E Kim
E Melamud
E Melamud
ET Wang
Federico Zambelli
G Pavesi
G Pesole
Giulio Pavesi
Graziano Pesole
H Keren
H Pearson
J Takeda
JA Calarco
JC Bourdon
M Irimia
M Mangiulli
M Sultan
MB Gerstein
ML Tress
P Bonizzoni
P Coggill
Q Pan
Q Pan
R Waltereit
RN Nurtdinov
RN Nurtdinov
RN Nurtdinov
ST Runyon
SW Roy
T Castrignano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Recent discoveries have highlighted the fact that alternative splicing and alternative transcripts are the rule, rather than the exception, in metazoan genes. Since multiple transcript and protein variants expressed by the same gene are, by definition, structurally distinct and need not to be functionally equivalent, the concept of gene orthology should be extended to the transcript level in order to describe evolutionary relationships between structurally similar transcript variants. In other words, the identification of true orthology relationships between gene products now should progress beyond primary sequence and "splicing orthology", consisting in ancestrally shared exon-intron structures, is required to define orthologous isoforms at transcript level. Results As a starting step in this direction, in this work we performed a large scale human- mouse gene comparison with a twofold goal: first, to assess if and to which extent traditional gene annotations such as RefSeq capture genuine splicing orthology; second, to provide a more detailed annotation and quantification of true human-mouse orthologous transcripts defined as transcripts of orthologous genes exhibiting the same splicing patterns. Conclusions We observed an identical exon/intron structure for 32% of human and mouse orthologous genes. This figure increases to 87% using less stringent criteria for gene structure similarity, thus implying that for about 13% of the human RefSeq annotated genes (and about 25% of the corresponding transcripts) we could not identify any mouse transcript showing sufficient similarity to be confidently assigned as a splicing ortholog. Our data suggest that current gene and transcript data may still be rather incomplete - with several splicing variants still unknown. The observation that alternative splicing produces large numbers of alternative transcripts and proteins, some of them conserved across species and others truly species-specific, suggests that, still maintaining the conventional definition of gene orthology, a new concept of "splicing orthology" can be defined at transcript level.</p

AIR Universita degli studi di Milano

Springer - Publisher Connector

Archivio istituzionale della ricerca - Università di Bari

Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

Author: A Krogh
A Lupas
A Sali
AC McHardy
Adam Godzik
AJ Enright
AJ Enright
B Rodriguez-Brito
BE Suzek
David Jones
DB Rusch
DH Huson
EF DeLong
FE Angly
G Yona
GW Tyson
J Park
JA Cuff
JC Venter
JD Bendtsen
JD Thompson
John C. Wooley
K Mavromatis
L Holm
L Krause
L Rychlewski
ML Tress
O Sasson
P Pipenbacher
PD Schloss
R Apweiler
RL Tatusov
S Mika
S Yooseph
SF Altschul
SG Tringe
SR Eddy
SR Gill
U Hobohm
W Li
W Li
W Li
W Li
Weizhong Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

eScholarship - University of California

Measuring Global Credibility with Application to Local Sequence Alignment

Author: Andrey Rzhetsky
B-JM Webb
Bobbie-Jo M. Webb-Robertson
BP Carlin
C Webber
Charles E. Lawrence
D Naor
DJ Lipman
HS Booth
HT Mevissen
I Holmes
J Zhu
JP Comet
JS Liu
JS Liu
JS Liu
KA Perry
KM Chao
L Yu
LE Carvalho
Lee Ann McCue
M Kendall
M Schlosshauer
M Vingron
M Vingron
M Zuker
ME Dayhoff
ML Tress
MS Waterman
R Durbin
RL Ott
S Henikoff
S Karlin
S Miyazawa
SF Altschul
SF Altschul
TF Smith
W Thompson
WR Pearson
WR Pearson
WR Pearson
Y Ding
YK Yu
Publication venue: Public Library of Science
Publication date: 01/05/2008
Field of study

Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1−α)%, 0≤α≤1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1−α)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments

Allelic Gene Structure Variations in Anopheles gambiae Mosquitoes

Author: AM McGuire
B Modrek
B Modrek
CE Pearson
DB Malko
DM Menge
E Birney
EC Swart
EV Kriventseva
F Oduol
G Dimopoulos
Guiyun Yan
H Nagasaki
H Ranson
J Li
J Sambrook
JC Venter
JM Johnson
JMC Ribeiro
Jose M. C. Ribeiro
Juan Valcarcel
Jun Li
L Zheng
LE Maquat
M Pombi
M Wang
MI McCarthy
MJ Gorman
ML Tress
MM Riehle
MM Riehle
NN Singh
P Early
PA Estes
PA Sharp
RA Holt
SD Schlueter
SM Gomez
TD Wu
V Nembaware
V Nembaware
W Gilbert
WH Majoros
Z Wang
Z Wang
Publication venue: Public Library of Science
Publication date: 01/05/2010
Field of study

Allelic gene structure variations and alternative splicing are responsible for transcript structure variations. More than 75% of human genes have structural isoforms of transcripts, but to date few studies have been conducted to verify the alternative splicing systematically.The present study used expressed sequence tags (ESTs) and EST tagged SNP patterns to examine the transcript structure variations resulting from allelic gene structure variations in the major human malaria vector, Anopheles gambiae. About 80% of 236,004 available A. gambiae ESTs were successfully aligned to A. gambiae reference genomes. More than 2,340 transcript structure variation events were detected. Because the current A. gambiae annotation is incomplete, we re-annotated the A. gambiae genome with an A. gambiae-specific gene model so that the effect of variations on gene coding could be better evaluated. A total of 15,962 genes were predicted. Among them, 3,873 were novel genes and 12,089 were previously identified genes. The gene completion rate improved from 60% to 84%. Based on EST support, 82.5% of gene structures were predicted correctly. In light of the new annotation, we found that approximately 78% of transcript structure variations were located within the coding sequence (CDS) regions, and >65% of variations in the CDS regions have the same open-reading-frame. The association between transcript structure isoforms and SNPs indicated that more than 28% of transcript structure variation events were contributed by different gene alleles in A. gambiae.We successfully expanded the A. gambiae genome annotation. We predicted and analyzed transcript structure variations in A. gambiae and found that allelic gene structure variation plays a major role in transcript diversity in this important human malaria vector

eScholarship - University of California

Entropy Measures Quantify Global Splicing Disorders in Cancer

Author: A Singh
A Srebrow
B Pilch
B Tian
C Cheng
C Ghigna
D Martin
D Puthier
Daniel Gautheret
DC Fischer
Denis Puthier
DO Watermann
E Stickeler
GM Hayes
H Jumaa
H Zhang
J Kelso
J Woolard
JM Johnson
JM Stuart
JP Venables
JZ Ni
LF Lareau
LK Zerbe
LM Sturla
M Ashburner
M Roy
M Zavolan
MA Garcia-Blanco
Manuel Ares
ML Tress
P Carninci
P Stoilov
Q Pan
Q Xu
Q Xu
R Karni
R Sorek
S Mazoyer
Samuel Granjeaud
T Maeda
V Le Texier
William Ritchie
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Most mammalian genes are able to express several splice variants in a phenomenon known as alternative splicing. Serious alterations of alternative splicing occur in cancer tissues, leading to expression of multiple aberrant splice forms. Most studies of alternative splicing defects have focused on the identification of cancer-specific splice variants as potential therapeutic targets. Here, we examine instead the bulk of non-specific transcript isoforms and analyze their level of disorder using a measure of uncertainty called Shannon's entropy. We compare isoform expression entropy in normal and cancer tissues from the same anatomical site for different classes of transcript variations: alternative splicing, polyadenylation, and transcription initiation. Whereas alternative initiation and polyadenylation show no significant gain or loss of entropy between normal and cancer tissues, alternative splicing shows highly significant entropy gains for 13 of the 27 cancers studied. This entropy gain is characterized by a flattening in the expression profile of normal isoforms and is correlated to the level of estimated cellular proliferation in the cancer tissue. Interestingly, the genes that present the highest entropy gain are enriched in splicing factors. We provide here the first quantitative estimate of splicing disruption in cancer. The expression of normal splice variants is widely and significantly disrupted in at least half of the cancers studied. We postulate that such splicing disorders may develop in part from splicing alteration in key splice factors, which in turn significantly impact multiple target genes

CiteSeerX

HAL AMU

HAL-Inserm